GutenTag: A Multi-Term Caching Optimized Tag Query Processor for Key-Value Based NoSQL Storage Systems

نویسندگان

  • Christian von der Weth
  • Anwitaman Datta
چکیده

NoSQL systems are more and more deployed as back-end infrastructure for large-scale distributed online platforms like Google, Amazon or Facebook. Their applicability results from the fact that most services of online platforms access the stored data objects via their primary key. However, NoSQL systems do not efficiently support services referring more than one data object, e.g. the term-based search for data objects. To address this issue we propose our architecture based on an inverted index on top of a NoSQL system. For queries comprising more than one term, distributed indices yield a limited performance in large distributed systems. We propose two extensions to cope with this challenge. Firstly, we store index entries not only for single term but also for a selected set of term combinations depending on their popularity derived from a query history. Secondly, we additionally cache popular keys on gateway nodes, which are a common concept in real-world systems, acting as interface for services when accessing data objects in the back end. Our results show that we can significantly reduces the bandwidth consumption for processing queries, with an acceptable, marginal increase in the load of the gateway nodes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-structured Redundancy

One-size-fits-all solutions have not worked well in storage systems. This is true in the enterprise where noSQL, Map-Reduce and column-stores have added value to traditional database workloads. This is also true outside the enterprise. A recent paper [7] illustrated that even the single-desktop store is a rich mixture of file systems, databases and key-value stores. Yet, in research one-size-fi...

متن کامل

Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Graphs stored in NoSQL Databases

Governments, corporations, startups, open data initiatives and other organizations are increasingly considering RDF and SPARQL in a broad range of information management scenarios. To reduce SPARQL querying times has been the main issue for virtually all the recent RDF triplestores, yet SPARQL caching techniques have not been broadly considered. In this paper we present Rendezvous, a middleware...

متن کامل

Multiterm Keyword Searching For Key Value Based NoSQL System

Today, the enterprise landscape faces large amount of data. The information gathered from these data sources are useful for improving on product and services delivery. However, it is challenging to perform searching activities on these data sources because of its unstructured nature Due to unstructured nature of these data, NoSQL storage has been adapted by many enterprises because it provides ...

متن کامل

Distributed NoSQL Storage for Extreme-Scale System Services

Today with the rapidly accumulated data, datadriven applications are emerging in science and commercial areas. On both HPC systems and clouds the continuously widening performance gap between storage and computing resource prevents us from building scalable data-intensive systems. Distributed NoSQL storage systems are known for their ease of use and attractive performance and are increasingly u...

متن کامل

XQuery processing over NoSQL stores

Using NoSQL stores as storage layer for the execution of declarative query processing using XQuery provides a highlevel interface to process data in an optimized manner. The term NoSQL refers to a plethora of new stores which essentially trades off well-known ACID properties for higher availability or scalability, using techniques such as eventual consistency, horizontal scalability, efficient ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1105.4452  شماره 

صفحات  -

تاریخ انتشار 2011